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Abstract 

The churn rate of a peer-to-peer system places direct hmitations on 
the rate at which messages can be effectively communicated to a group 
of peers. These limitations are independent of the topology and message 
transmission latency. In this paper we consider a peer-to-peer network, 
based on the Engset model, where peers arrive and depart independently 
at random. We show how the arrival and departure rates directly limit 
the capacity for message streams to be broadcast to all other peers, by 
deriving mean field models that accurately describe the system behavior. 
Our models cover the unit and more general k buffer cases, i.e. where a 
peer can buffer at most k messages at any one time, and we give results 
for both single and multi-source message streams. We define coverage rate 
as peer-messages per unit time, i.e. the rate at which a number of peers 
receive messages, and show that the coverage rate is limited by the churn 
rate and buffer size. Our theory introduces an Instantaneous Message 
Exchange (IME) model and provides a template for further analysis of 
more complicated systems. Using the IME model, and assuming random 
processes, we have obtained very accurate equations of the system dynam- 
ics in a variety of interesting cases, that allow us to tune a peer-to-peer 
system. It remains to be seen if we can maintain this accuracy for general 
processes and when applying a non-instantaneous model. 



1 Introduction 

Fundamentally, in a peer-to-peer (P2P) network, messages can only be ex- 
changed between peers that are online state. If a peer is not online then it 
is offline and no messages are exchanged with a peer while it is offline. When a 

*This work was in part funded by the Australian Research Council, ARC Discovery Project, 
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peer is online then it is available to send and receive messages from other peers. 
By considering each peer to be in one of these two states and by examining the 
frequency of state changes over all peers, we describe the chum of the P2P net- 
work. A high churn means a high frequency of state changes and vice versa. We 
can bound the total number of peers, in which case a given peer will continue 
to alternate between states over time, or we can allow the number of peers to 
be infinite and consider a finite number of online peers, in which case a given 
peer may never return to the online state after becoming offline. We call these 
cases the finite and infinite population models respectively. In both cases there 
is an expected number of peers that are online, at any given time, when the 
P2P network is in equilibrium. In this work we focus on P2P networks with a 
bounded number of peers because we are interested in how the churn affects the 
message dissemination capacity of the network; though we make some remarks 
about the infinite population model for interest. 

Messages may be generated by users, application processes, data sources or 
as the result of control traffic, e.g. stabilization or response to changing network 
conditions, between peers. Using the finite population model we can assume in 
this work that messages are to be disseminated to all other peers, including to 
those that may happen to be offline at the time the message was generated. 
In these circumstances we naturally ask for the time taken for a message to be 
received by all (or a fraction) of the other peers with the understanding that, 
as peers change between the offline and online states, all peers may or may not 
eventually receive the message in question. 

As one example, consider a query message that originates at a peer in an 
unstructured P2P network; where each peer in the network may have some data 
that is relevant to the query. In a basic network the query message is flooded 
by each peer evaluating the query, forwarding the message to other peers and 
possibly responding to the query. Typically each peer that receives the query 
will delete it after consideration. In the finite population model we ask for the 
fraction of total peers that received the query message, what we call the query 
message coverage. In the infinite population model we would ask for the absolute 
number of peers that received the message. Intuitively, the peers that receive 
the query message are those peers that were online while the query message 
was being fiooded. We can refine our intuition by considering those peers that 
changed state, either from offline to online or vice versa, during the time that 
the query message was being flooded. 

In general, messages may be buffered rather than being deleted immediately 
after processing. If a message is buffered indeflnitely by at least one peer then 
(for a bounded number of peers) the message will eventually be received by all 
peers, i.e. the coverage will be 100%; this is the case for infinite sized buffers. 
For a large volume of messages, e.g. consider when all peers in the network are 
frequently generating queries, and there are limited resources available at the 
peers, e.g. mobile devices versus desktops, it becomes more practical to place 
a limit on the buffer size. In this case a message is buffered only for some time 
and some peers may not receive the message at all. In this work we examine 
how the size of the buffer relates to the coverage. To do so we also consider the 
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message rate. 

1.1 Instantaneous message exchange 

To clearly exaraine the affect from churn and finite buffer size we first eliminate 
any affects from the sub-communication system (i.e. the Internet in most cases) 
by allowing any number of message transmissions between online peers to take 
place instantly; we refer to this as the instantaneous message exchange (IME) 
model. In this case, e.g., a message that is generated at a peer that is online is 
instantly communicated to all other peers that arc also online at that time of 
message generation. While this generally is an unrealistic allowance, it allows 
us to model how churn and finite buffer size places limits on message coverage 
regardless of the sub-communication system's ideal performance. 

From another perspective The IME model is applicable when churn is suf- 
ficiently low relative to the message propagation time through the network; in 
the sense that the network appears to be static from the perspective of a single 
message propagation. 

Aspects such as bandwidth and latency in the sub-communication system 
lead only to further limitations; i.e. in this work we are concerned with how 
churn places a limit on the message coverage and we make ideal or best case 
assumptions about other aspects, which can otherwise only lead to the message 
coverage being further limited. 

1.2 Related work 

To the best of our knowledge there are currently no results that have proposed 
the IME model as a starting point. The dynamics of the IME model can be 
analyzed by taking an epidemic information dissemination approach [4], some- 
times called randomized rumor spreading [5]. Based on analysis of infectious 
diseases [1] , the two basic models are infect and die and infect forever. In the 
infect and die model, a disease or message is communicated for only a single 
round and then the peer no longer participates. In the infect forever model the 
peer can continue to communicate the message forever. For the very basic case 
in the IME model, i.e. a single message broadcast, the infect forever model is 
applicable. However, for a stream of messages that are being broadcast then 
the situation is a non-trivial combination of infect and die (because of the finite 
buffer limitation) and infect forever (because peers can continue to communicate 
a message until they receive a subsequent message). Also, epidemic information 
dissemination usually assumes that subsequent generations of infected peers are 
selected at random from the population. In the IME model this is not true 
because a peer can only receive a message when it is online. 

The most closely related work is that of Yao, Leonard ct. al. [10]. They 
model heterogeneous user churn and local resilience of unstructured P2P net- 
works. They also concede early that balancing model complexity and its fidelity 
is required to make advances in this area. They examine both the Poisson and 
Pareto distribution for user churn and provide a deep analysis on this front. 
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Their work focuses on how churn affects connectivity in the network and we have 
separated this aspect from our work and concentrated on message throughput. 

Other closely related work concerns mobile and ad hoc networks, and sensor 
networks, because these applications require robust communication techniques 
and tend to have limited buffer space at each node. The recent work of Lin- 
demann and Waldhorst [6] considers the use of epidemiology in mobile devices 
with finite buffers and they follow the seven degrees of separation system [8] . In 
particular they use models for "power conservation" where each mobile device 
is ON with probability poN and OFF with probability poFF- Their analytical 
model gives very close predictions to their simulation results. In our work we 
describe these states using arrival rate. A, and departure rate, /j,, which allows us 
to naturally relate this to a rate of message arrivals, a. We focus solely on these 
parameters so that we can show precisely how they affect message coverage rate. 

Other closely related work such as in [7] looks at the rate of file transmission 
in a file sharing system that is based on epidemics. The use of epidemics for 
large scale communication is also reviewed in [9]. The probabilistic multicast 
technique in [3] attempts to increase the probability that peers receive mes- 
sages for which they are interested and to decrease the probability that peers 
receive messages for which they are not interested. Hence it introduces a notion 
of membership which is not too different to being online/offline. Autonomous 
Gossiping presented in [2] provides further examples of using epidemics for se- 
lective information dissemination. 

1.3 Organization 

In Section [2] we describe our IME model and show the derivation of equations 
that accurately predict its behavior. We compare the analytical results with 
simulations. In Section [2. 3. 41 we examine the use of the model to choose message 
rates appropriate for the churn. In Section 13.11 we provide a derivation of the 
k buffer and multi-source cases. We conclude the paper in Section 2] with some 
overall observations and future work. 

Table [T] provides the notation used in this paper. Generally, when a function, 
/, is provided with a subscript /q or fi then the function is representing either 
the offline or online case resp, e.g. ng and ni in Table [1] Later in the paper 
we extend this subscript notation to represent more complicated cases. For 
stochastic processes like X{t) we use X{t) as the expected value and we use 
X{t) as the normalized expected value. 

2 IME model and analytical formulation 

We begin this section with a basic description of the IME model using queueing 
theory. We then provide a basic analysis for a single message broadcast. Results 
from the basic analysis are used throughout the paper. Simulation results are 
compared to for each of the results. 
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Table 1: Notation 



N 


number of peers 


A 


arrival rate of a peer 




departure rate of a peer 


no 


mean number of peers offline 


ni 


mean number of peers online 


t 


time 


X(t) 


coverage of a single message at time t 


a 


rate of message arrivals 


C 


average coverage of a message in a message stream 


Chase 


average base coverage a message in a message stream 


c* 


(extended) coverage rate 




fraction of messages of type T (e.g. type LI, 1, 0, etc.) 


k 


size of the buffer 




number of source peers 



2.1 Instantaneous message exchange model 

Consider a set of peers where each peer, i G {1, . . . , N} has a state, Si G {0, 1} 
where means offline and 1 means online. We say that a peer is online or offline 
to mean which state it has. Let each peer change from offline to online at a 
random time to according to an "arrival" rate, A, such that: 

P[ta <t] = l-e-^\ 

where E[to] = 1/A is the mean time taken to change state from offline to on- 
line, with a cumulative distribution function given by the Poisson distributior0. 
Similarly E[ti] = is defined as the mean time to change state from online to 
offline with "departure" rate, fi. In other words, each peer spends a proportion, 
J : i, of its total time arriving and departing respectively; shown in Figure [TJ 
The peers are described by an M/AI/c/c/c queueing system (where c — N) as 
shown in Figure [21 

Using Figured! if we let ni be the number of online peers, then we can write: 
Pn = P[?ii = n] 

Po = P[ni = 0] 



In future work we shall investigate variations of the Pareto distribution for peer lifetimes. 
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online \ offline 




Figure 1: Peers are either online or offline; each peer independently arrives and 
departs with rates A and /x respectively. 



ATA (AT- 1) A (iV-2)A 2 A A 




M 2/i (7V-2)/x (N-l)fi N fj. 



Figure 2: State-transition diagram for M/M/c/c/c queueing system (where 
c = N) queueing system. 

Hence the system reaches an equilibrium when the number of online peers is 

JV 
n=0 

and equivalently when the number of offline peers is 

no = j^N = N-ni. 

In this work we assume that the system is in equilibrium and we use no and 
ni as continuous variables. 

2.2 Single message broadcast 

Consider the case when a peer, called the source, is chosen uniformly at random 
from A'' and at time t = that peer generates a new message. The notion that 
messages can be transmitted instantly between online peers is described by the 
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rule: peer i has received the message by time t iff there is some t' < t such that 
peer i was onUne with some other onhne peer that aheady had the message, at 
time t'. A peer that remains offline up to (but not including) time t, cannot 
have received the message before time t. 

Let X{t) be a continuous time, discrete space, stochastic process that counts 
the number of peers with the message by time t. We say that X{t) is the 
coverage of the message at time t. At time the source peer is initially offline 
with probability uq/N and online otherwise. Hence we consider X{t) which is 
the weighted average of two different coverage functions, Xo{t) and X\{t) for 
the initially offline and initially online cases respectively. 

For generality, we allow messages to be generated even if/ while the peer is 
offline. This is generally required in the case that, e.g., the peer is generating or 
collecting data which is independent of whether the peer is online or not. The 
special case when messages are not generated at a peer that is offline, is then a 
simplification of the general case. 



2.2.1 Source peer starts online 

In this case, coverage starts at time t = Q. After time tm = (^^'^ some 
integer m > 0) the average number of peers that have the message is 

E[Xi(t„)] = Xi{tm) = m + no(l - (1 - l/mT) 



N t X 



^x,{t) = N ^-YTT^ 

A-\- fJi 

Xi(t) , e-*^/x 

In this work we mean field analysis in terms of N, since a P2P network is 
expected to consist of a large number of nodes. The technique simplifies the 
derivations in some cases; equations in terms of N are however always possible 
and we use both forms throughout. 



2.2.2 Source peer starts offline 

If the source peer starts offline then it becomes online with probability A * 
at time t, i.e. at an average time of 1/A. Hence Xo{t) is the convolution with 

Ai(t): 



oit)= [ 
Jo 

- [ Xe-^^dT (2) 
Jo 



Xn(t)= I X^(t-T)Xe-^''dT 



+ 1 



Mt)= lim M)=i_^"MA + /. + Um) 

^ ^ N^oc N X + fl 
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The constant 1 and last term of the integration represent the diminishing 
constant which accounts for the "fraction of the source peer" that has not yet 
become onhne. Before the source peer becomes onhne, or more specifically at 
time t = 0, Xq{0) = 1. As t increases, the probability increases that the source 
peer becomes online and so too does the average coverage increase, where the 
first term accounts for the fraction of the source peer that has become online. 

2.2.3 Expected coverage 

The expected coverage is the weighted average of Eqs. [l]and[21 

^ ^ _ e-^V {fi + X{2 + tfi)) (3) 

Of course, the observed coverage either starts from time t = or starts at 
an average time of 1/A. The expected coverage in Eq. [3] represents the average 
of these two cases. Figure [3] shows 10 simulation runs when A = = 1 and 
A'' = 1000. The simulation tool is described in the Appendix for reference. The 
points in a series indicate times when a new peer received the message. The 
solid lines are Xi{t) and X{t) (the functions are plotted starting from time 0.1). 
Note that X{t) is only an average, and is not representative of either the online 
or offline cases. 








t 



Figure 3: Simulation runs (points) (A = /_* = 1, A^ = 1000) and theoretical 
curves (solid lines) for Xi{t) and X(t). 
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2.2.4 Basic simulation results 



2.3 Multiple message broadcast - unit message buffer 

Let messages {1, 2, . . . } be generated at a source peer with rate a and consider 
a sequence of generation times {mi,m2, . . . }. Apply tlie rule that each peer 
discards a current message in favor of a newer message and does not receive 
messages that are older than the current message. A message is said to be 
skipped by a peer if that message is not received by the peer because a newer 
message has already been received. Figure S] shows an example realization in 
time of events that occur at the source peer. It also shows a number of numerical 
quantities that are important for the equations. A solid vertical line represents 
the source peer moving from offline to online. A dashed vertical line represents 
moving from online to offline. Arrival of a new message is shown by a x and 
the message numbers are given at the bottom of the figure for reference. 

i i i 

X fi a 



X ixx X 


X xi X 


X ix X 










► 



T"^ time 

so 5i 
1234 56 7 89 10 

Figure 4: Example realization in time of the source peer that changes between 
online and offline. Arrival of a message at the source peer is depicted by a x . 

We are interested in computing the average coverage of a message in a mes- 
sage stream such as that shown in Figure [H Since messages are discarded 
in favor of new ones, the coverage X(t) of any given message is limited. If 
Ci{a; A,/i) is the coverage of message i in the message stream of M messages 
then we define 

1 

C= lim — 

i=l 

as the average coverage of a message in a message stream. 

In Figure|4]note that message 1 arrives while the source peer is online, there- 
fore it is immediately received by all other peers that are online and the coverage 
of that message at time mi (its arrival time) is immediately ni. Shortly after 
time mi, the source peer goes offline. Message 1 continues to be transmitted be- 
tween new peers that enter the network and its coverage increases. The arrival 
of messages 2, 3 and 4 does not hinder the transmission of message 1 because 
the source peer is still offline. The coverage of these messages is exactly 1. How- 
ever, shortly after the arrival of message 4, the source peer goes online. At this 
point, all online peers receive message 4 and the coverage of message 4 jumps to 
ni. The coverage of message 4 continues to grow until the arrival of message 5, 



9 



which is immediately transmitted to all online peers (overwriting message 4 due 
to the unit buffer restriction). Message 5's coverage increases until the arrival 
of message 6, and so on. 

Note that messages 1, 6 and 8 are the last messages to arrive in each interval 
for which the source peer is online. These messages continue to increase their 
coverage until the source peer moves back online and has a newly arrived mes- 
sage in the mean time. E.g., if message 7 did not arrive then message 6 would 
have continued to grow until message 8 arrived. Also note that messages 2, 3 
and 9 were not received by any other peers and never will be, their coverage 
remains at 1. 

We identify four categories of messages (the naming scheme simplifies our 
notation later), listed in Table [2j 

Table 2: Message categories 



LI A message that is the last one to arrive before the peer moves 

from online to offline (e.g. messages 1, 6 and 8). 
1 Messages that arrive while the peer is online but not a LI mes- 
sage (e.g. message 5). 
LO A message that is the last one to arrive before the peer moves 

from offline to online (e.g. messages 4, 7 and 10). 
Messages that arrive while the peer is offline but not a LO mes- 
sage (e.g. messages 2, 3 and 9). 



When only one message arrives in an interval (by interval we mean either 
the online or offline time interval) then that message is a LI or LO (i.e., in this 
case there are no 1 or messages in that interval). 

Intuitively, 

lim C{a ; = N 

because at low message arrival rates each message has ample time to cover all 
peers. Of course, when a — > then the message throughput is low which is 
undesirable. 

As a consequence of our IME model, note that any message that achieves a 
coverage of greater than 1 will achieve a coverage of at least rii. We call this 
the base coverage and we note that the average base coverage is the average 
coverage of a message as the message rate becomes large: 

5hase(A,//)= lim C{a ; A,^) "i + x ^ ■ 

a~tco A + fJ, A + ^ 

All 1 messages reach exactly ni peers (immediately on their arrival) , all mes- 
sages reach only 1 peer (the source peer) and the number of LI and LO messages 
becomes negligible. 
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Since in the IME model, C does not approach as a oo, the natural 
coverage rate a, C (or message-peer throughput) is unbounded. However, we 
observe that in this case the average coverage approaches the constant Chase 
and so while the rate of messages is arbitrarily large, the messages are received 
by only a fixed fraction of the peers. 

For these reasons we are interested in coverage achieved beyond the base 
coverage and we formulate what we call the mean extended coverage rate: 

g*(a;A,A^)^a ^^"'^'^l'^''-^^'^^ (4) 

In this paper we simply refer to Eq. 3] simply as the coverage rate which has 
units message-peers per time unit. When a is small, while a single message will 
cover all of the peers, there are not many messages and the overall number of 
messages received by the peers is small; ultimately the coverage rate falls to 
zero. When a is large, while a number of peers ui, at any one time, may receive 
a large number of messages the actual coverage of a single message is at most 
Chase and the extended coverage drops to 0. ^ 

To analyze Eq. [4l we derive an equation for C{a ; A,/i) by combining indi- 
vidual equations for the different message categories. 

2.3.1 Fraction of appearance for each message category 

Clearly, of all messages will arrive while the source peer is offline, i.e. they 
are and £q messages, and similarly for 1 and LI messages. We need to know 
the fraction for each category; in Figure [4] we use to represent the fraction 
of messages that are and to represent the fraction of messages that are 1. 
Then the fraction that arc LO is: 

and similarly for the L\ fraction: 

For a given message rate a, in the time interval 1/A we have a/A messages 
arriving on average. The fraction is derived by summing the individual 
probabilities of fc > 2 messages arriving before the state change from offline to 
online occurs, where we are weighting the event that fc — 1 messages become 
messages. This is divided by the average number of messages that arrive 
because we are interested in the fraction of such messages, not the total. The 
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equation becomes: 



Similarly: 







(a + A) (A + Ai) ■ 



a A 



(a + M) (A + m)' 
Hence Eqs. [S] and [S] become: 



and 

A^i 

4li 



(a + A*) (A + A*) 



(8) 



- (a + A)(A + A.) 



(10) 



2.3.2 Coverage of each message category 

The average coverage of messages, Cq, is trivially 1 since these messages do 
not have a chance to be communicated to other peers. The average coverage of 
1 messages, Ci, follows the single message coverage function from Eq. [1] Xi{t), 
where the average time is 1/a. Since Xi(t) is non- linear we integrate the growth 
over all possible times: 

/•OO 

Jo 

M -I ^"At 

^ -(a(A+A.))+jVA^ log(-( ^+^-">' )) 

A (a + A + ^t) 
(a + A) (A + fi) 

The coverage of LI and LO messages is more difficult to model because their 
average coverage time is affected by whether the peer is online or offline when 
the subsequent message arrives. If the peer is offline then the time increases 
by an amount given by the average time for the peer to become online again. 
We therefore further categorize these messages as LI — 1, LI — 0, LO — 1 and 
LO — messages, depending on whether the peer is online or offline when the 
subsequent message arrives. 



C\ 



Ci 



lim 



9i 

N 
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For LI — messages we integrate over the shaded regions shown in Figure 
[5] and we essentiahy round up to the 1/A interval. For LI — 1 messages we 
integrate exactly to the time t. The integration is similar for LO — and LO — 1 
messages, except that the integration time t = begins at the beginning of a 
interval rather than at the beginning of a 1/A interval. Let a = j + j^, then 



i + i 

A + M 



2,1 



2 , 2_ 

\ ' fj. 



t 



Figure 5: Probabihty distribution, ae Shaded areas represent average times 
when the peer is offline. 



the integrations become: 

Cli-o = 
Cli-i = 



■0 + 1) a 



Clo-o-^Xi((j + 1) a) / 

We show here only the expressions when N oo. For LI messages: 
Cli-o + Cli-i 



Cli = lim 

iV— too 



N 



(a + e " (A — 6° (a + A))) /i 

1 H 

f-l + e j (a + A) (A + ^) 



(11) 
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Similarly for LO messages: 



LO 



lim 

N^oo 



N 



(eX ((_i+e)a-A)+A) 



(c + A) (A + >i) 



+ A (a + A + /^) 



-l+e 



(a + A) (A + ^) 



(12) 



2.3.3 Total coverage equation 

Each of the coverage functions contribute according to the fractions in Eqs. [51 
Ellin] and m Hence: 

C = ^1 Ci + (Cii-o + Cli-i) + 

£.0 + S,Lo {Clo-0 + Clo-i) ■ (13) 

Note that for C the fractional term becomes and all other terms are 
interchangeable . 

While inspection of the final coverage equation does offer some insight, we 
omit it due to its complex structure. Figure [6] shows the results from simu- 
lations. Each point is the average of 10 trials with N — 100, fj, — 1.0, 1000 
messages transmitted and other parameters as shown. The coverage is normal- 
ized. Clearly, as a increases then the coverage decreases. Note that as a — > oo, 
the coverage limits to Chase ■ The precision of the simulation results decreases 
as a increases because the simulation is run for a fixed number of messages and 
hence an increased a leads to a decreased run time. The solid lines represent 
the coverage as evaluated from Eq. [131 

Clearly, if a specific coverage is required (at least) by an application then a 
takes a limited range. E.g., if the application requires a coverage of at least 0.6 
of the total peers, in a case where X — I-J, — 1.0 then from Figure [6l a is limited 
to be less than roughly 1. 



2.3.4 Coverage rate 



We plot the coverage rate from Eq. [H in Figure 7(a) When a is low, the 
coverage rate is close to a. The coverage rate is never equal to a > and 



Figure 7(a) shows y = a as a reference. As A increases (or equivalently as fj, 
decreases) then it is possible to achieve a higher coverage rate because peers 
spend more of their time in the network. 

The coverage rate saturates with large a and we have: 



lim C*(a;A,/i) 



A (-M + e (2A + ^)) 



e(2A + /i) 

These limits are shown in Figures 7(a) and |7(b)] as horizontal dashed lines. 



(14) 
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a 



Figure 6: Averaged simulation runs (points) and theoretical curves (solid lines) 
for C with fi — 1.0 and A as shown. 




0.01 0.1 1 10 100 1000 1 10 100 1000 10000 

a a 



(a) = 1.0 (b) fi = 100.0 

Figure 7: Coverage rate, C*, (solid lines) with A as shown. 



Note that: 

lim C*(a;A, /i) = — — ^ — , 

which is a consequence of there being no message transmission rate limits on 
individual peers. 

Figure |7(b)| shows that the coverage rate exhibits a local maximum which 
approaches Eq. [TH Hence, for parameters in these ranges it is possible to 
achieve a close to maximum coverage rate at a relatively small a. For example, 
when A = 0.5 and /i — 100.0 then from Figure [7(b) | we have a « 20 to achieve 
a coverage rate that is very close to the maximum, which would not otherwise 
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be met until a» 1000. 

Simulation results (not reported here) show that these coverage rates are 
highly susceptible to deviations from the average coverage. Therefore, this anal- 
ysis can serve as a rough guide, in the sense that while a value of a may be 
computed as giving a particular coverage rate, an observed coverage rate (which 
is necessarily over a finite range) is likely to deviate from the theoretical predic- 
tion. 

3 Message buffer and multiple sources 

In this section we formulate the coverage for the cases when peers can buffer k 
messages and when there are multiple sources of messages. 

3.1 Using a A;-bufFer 

We calculate coverage for the fc-buffer case similarly to the unit buffer case, 

dividing messages into separate categories. In this section we redefine the 
and ^o«t fractions to be with respect to the A:-buffer case. We also define further 
fractions. 

3.1.1 Message categories and their fractions 

As in the unit buffer case we consider the fraction of messages that arrived in 
the period when the source peer was either online or offline and was pushed 
from the buffer before the peer changed its state; which we call ^i-fe and ^o-fe, 
meaning fraction of 1 and messages respectively, where buffer size is equal 
to k. In this case we calculate the fraction ^o-fc by summing the individual 
probabilities of s > fc messages arriving before the state changes from offline to 
online, where we are weighting the event that j — k messages become messages. 
As in the unit buffer case, this is divided by the average number of messages 
that arrive. So, the equations become: 

io-. = j^,^r E (i-fc)e-M!Ae--dt. (15) 
•'0 j=(k+i) -'■ 

Similarly: 

^i-^=A^^/ J: U-k)e-'^-^^,e--'dt. (16) 
■'0 j=(k+i) 

Unlike the unit buffer case we now have to consider a number k of both LI 

and LO messages. We say that a message is an LI — i or LO — i message to mean 
that (z — 1) messages arrived before the peer changes the state. 

Each LI — 2 and LO — i message will have it's own rate and coverage because 
each of them will have a different coverage time. This is because there is, e.g.. 
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an average time of 1/a between message LI — k and message il — (fc — 1), and 
so on. 

12 3 4 

— X X X X — ^ 

Figure 8: Arrival of messages, while the source peer is online, with k = 2 and 
showing the fractions of different message types. 

An example for = 2 is shown in Figure [H] In the figure: 

1 — 2 : Message 1 is propagated until messages 3 comes, similarly message 2 
propagates until message 4 is generated. In other words both of messages 
1 and 2 get coverage until 2 more messages arrive. 

LI — 2 : Message 3 is propagated until the peer goes offiine, and then it contin- 
ues to propagate until at least one more message has arrived after message 
4 and the peer has come back online. 

LI — 1 : Message 4 is propagated until at least two more messages are gener- 
ated, similarly to the previous example. 



-X X 



-X- 



-X- 



V ' 

Figure 9: Computing the fraction ^Li-i- 
We derive ^Li-i, i = 1,2, . . . ,k, as shown on Figure [9l 

~ Cl-(i-l) ^ 



Similarly: 



— ^O-(i-l) ~ Co 



In the above equations Ci-(i-i), ■^i-i and Co-(i-i): Co-i fractions are calcu- 
lated using equations 1161 andll5 ( respectively, substituting for k. 
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3.1.2 Coverage of 1 messages 

All 1 — fc messages propagate until at least k subseqiient messages have been 
generated. So we look at the probability of the k-th message arriving at time t. 
We use the Erlang distribution to compute the probability of the fc-th message 
arriving: 



Ci_, = I X,{t) j^-^ e-'^Ut. (17) 
3.1.3 Coverage of LI — i messages 

As in the unit buffer case we consider LI — \ — i and LI — — « messages. A 
message is propagated while it exists in the buffer; to be pushed out of the buffer, 
k more messages have to arrive into the buffer. Because we are considering only 
time periods after the period in which message i was generated, we know that 
(? — 1) messages have already arrived. Thus only k — (i — 1) more messages 
have to arrive to push the LI — i message from the buffer. We are interested in 
the probability of the i-th message arriving in a subsequent online period. The 
propagation starts from the point when the i-th message was generated, i.e we 
are interested in the coverage after time t+^^. We use the Erlang distribution 
again: 



oo ^{j + l)a ■ -, fc_(j-l),fc-j 



Similarly for LI — — i, we modify the result for LI — to get: 

ClI-O-i = 



j^o ^ Jja {k-i)\ 

3.1.4 Coverage of LO — i messages 

We use equations derived for unit buffer case, except as for LI — i messages we 
wait until the {k — {i — l))-th message arrives: 



CLO-o-i = ^X((i + l)a) / e-"Ut, 
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Note that the time within XQ has no offset because it does not matter how 
recent a message arrived before the peer became onhne, propagation starts only 
after the peer becomes onhne. 

3.1.5 Total coverage equation for fc-buffer case 

We add the coverage of each message type to get total coverage: 

C(a; A, fi, k) = Ci_fc+ 

k 

^ (Cii-o-j + Cli-1-j) + 

3 = 1 

k 

Cio-i (Cio-o-i + Clo-i-j)- 

FigurefTUlshows theoretical results for k values 1,2,3 and 5 with fixed X = ^ = 1. 
Figure [TT] shows a comparison of simulation runs with N = 100, /i = 1 and 1000 
messages in the network. Figure [T^] shows the theoretical increase in coverage as 
k increases, for various lambda. E.g., to achieve a coverage of at least 0.7 when 
A = 0.5 we need to have a buffer size of at least 4. Clearly, increasing k increases 
the coverage. Furthermore, limQ_^oo C(a; A, /x, A:) = limQ,_,oo (^(a; A, /z, 1) since 
the fraction of LI — fc — i and LO — k ~ i messages becomes insignificant. 

3.2 Multiple source model 

In this section we consider the case when multiple peers are generating messages. 
We maintain the message arrival rate a on a network wide basis, i.e. if there 
are Ng sources in the network each peer generates messages with rate rate. 
We make the following simplifications: 

• We ignore messages that occur on the peer when the peer is offiine. These 
kinds of messages were included in the previous sections and could be 
removed if desired. This simplification limits our model in this section to 
applications where messages are only generated on peers that are currently 
online. 

• We assume that Ns is sufficiently large. Small values for Ng, experimen- 
tally determined to be less than about 10, lead to a large variety of message 
classes that we have not yet simplified. Practical values of Ng are easily 
sufficient to justify this simplification. 



ChO-l-i 
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Figure 10: Theoretical results for C{ol\ A, /i, /c) with /.t = A = 1.0, fc as shown. 



The simplifications allow the multiple source case to be a direct result of the 
single source case. 

In our model, arriving messages are randomly assigned to one of the N s 
peers and so if N s is sufficiently large then a new message arriving at a peer has 
a probability of A/(/i + A) of arriving on a online peer (these are the 1 messages) 
and it arrives on an offline peer otherwise (these are the messages). Our 
analysis however takes into account that some source peers may not be online 
at some times, by allowing the possibility of messages but then ignoring them 
for coverage rate equations. The messages also never enter the buffer and so 
they do not reduce the coverage that way either. 

Under the aforementioned circumstances, we consider 1 messages in different 
classes determined by the number of subsequent messages that occur before 
the next 1 message, to make i messages in total. Clearly the probability for 
each class is a Bernoulli trial, and the time for i messages to occur is given by 
the Erlang distribution similarly to Eq. [ITl We arrive at a coverage for the unit 
buffer size: 



Note that the coverage of a message in the multisource case is the same for all 
of the sources. If we are considering a buffer of size fc then we need to consider 
the arrival of fc messages that are inter-dispersed among the i > k messages and 
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0.1 1 10 100 

a 

Figure 11: Averaged simulation runs (points) and theoretical curves (solid lines) 
for C(a; A, /i, k) with ^ = 1.0, fc = 3 and A as shown. 

we see that there are (^~^) combinations. Therefore we obtain: 

Figure [13] shows the coverage over a large range of a for Ng = N = 100 nodes 
and fc = 1. Different values of A are shown. The coverages in this section cannot 
in general be compared with the coverages of the previous section because the 
previous section included messages that reduced the coverage. Note that the 
theoretical coverage can be seen, especially for A = 0.5, to be slightly lower than 
the simulation. This is due to the assumption that Ns is sufficiently large. The 
assumption becomes worse as A becomes smaller because the effective number 
of source peers that are online reduces. 

The increase in coverage versus k is shown in Figure 1141 Numerical compu- 
tation of the theoretical values became inaccurate beyond k — 20. Note that the 
chart is for the case when a — 10. Also, again the theory slightly undershoots 
the simulation result as A becomes smaller. 

Be aware that the value for a shown in Figures [T3] and (TJ] are "net" or 
"total" messages rates. Each peer is providing messages at a rate of only a/Ng. 
Therefore for N — Ng = 100 and a — 100, the effective rate of a message stream 
from a given peer is only 1 per second. Thus, the coverage rate for that peer is 
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Figure 12: Theoretical increase in C{a; X, fj,, k) versus fc lor /x = 1.0 and A as 
shown. 

similarly less, even though the coverage of the messages is the same. In other 
words, considering Figure [131 if we look at the coverage at a = 100 we should 
consider the rate of messages from a single peer to be only 1 and the coverage 
rate is then 100 times worse than the single source case. Increasing buffer size 
can allow us to increase a without sacrificing coverage and hence to maintain a 
steady effective message rate per peer. 

Figure [15] shows the theoretical smallest value of k that maintains a given 
coverage as a increases. Clearly the buffer requirements increase proportionally 
to the message rate. 

4 Conclusion 

We have proposed the Instantaneous Message Exchange (IMF) model as a fun- 
damental approach for analyzing the affect of churn on streaming message rates. 
We derived very accurate equations to describe the behavior of the P2P system 
and we showed how the equations can be used in various ways to determine 
good system settings. F.g., we can choose appropriate limitations on message 
transmission rates with respect to churn (or vice versa) in order to achieve high 
message throughput. We can also see how buffer size enhances the message 
throughput and how the number of source peers affects these relationships. 

In our analysis we have attempted to provide the most accurate descriptions 
of the system, over all ranges of parameters. In a number of cases the equations 
are complicated, including three or four complicated terms that are significant at 
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Figure 13: Averaged simulation runs (points) and theoretical curves (solid lines) 
for C versus a with fj, = 1.0, A = 1.0, and k = 1. 



different ranges of the parameters. The IME model was instrumental in allowing 
us to derive these equations. It remains to be seen whether we can maintain 
the accuracy of the theoretical work while moving towards a non-instantaneous 
model. 

Future work includes: (i) including peer bandwidth and network delay lim- 
itations, (u) examining more general communication patterns, (Hi) using a 
Pareto distribution or other more suitable distribution from trace data and {iv) 
developing algorithms that reach the maximum coverage rates. 

[Simulation tool] 

We developed a basic simulation tool to test our models against. In a nut- 
shell, the simulator is event based and maintains states for A'' peers, including 
whether each one is online or offline, which messages it has received or sent, 
etc. The events of the simulator include message generation events (i.e. a peer 
generates a message), and transition events from online to offline and vice versa. 
Events for the transmission of a message from one peer to another arc not part of 
the IME model and are therefore not required in the simulation. Message trans- 
mission is purely a consequence of message generation and peer online/offline 
states and transitions between these states. 

The simulation begins by assigning each peer as online or ofSine, with prob- 
ability A/ (A -|- (U) and /Lt/(A -|- /z) resp. In the single message case the simulation 
then chooses a peer at random and runs until the message has been received 
by all peers. For the message stream case the simulation continues to run until 
a given number of messages have been generated. A sufficiently large number 
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Figure 14: Average simulation runs (points) and theoretical curves (solid lines) 
for coverage versus k for Ng = N = 100, /U = 1, a = 10 and A as shown. 



of messages need to be generated in order for the observed results to be in- 
dependent of the starting configuration of online/offline peers, in other words 
depending on fi, A and a. An appropriate number of messages was set exper- 
imentally. For the multiple source case the simulation randomly chooses Ns 
peers as source peers. 

Event times are real numbers and the simulator orders all pending events 
and processes one event at a time. Following is the brief explanation of what 
happens after each event: 

1. Message generation: 

(a) Peer was online: 

• Send new message to all of the online peers. 

• Compute next message generating event for that peer. 

(b) Peer was offline: 

• Compute next message generating event for that peer. 

2. Changing from online to offline: 

• Schedule the next time to change to online. 

3. Changing from offline to online: 

• Merge buffers with all online peers to produce the buffer with newest 
messages; see notes below. 
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Figure 15: Theoretical smallest value of k that gives C as shown, for = A = 1 
over the a range. 

• Send that buffer to all of the online peers. 

• Schedule the next time to change to offline. 



• An invariant of the simulation is that at any point in time the biiffers of 
the online peers are identical. This is a consequence of the IME model. 
If bandwidth or latency for message transmission were taken into account 
then the invariant would be broken. 

• Each message is assigned a number such that message i is newer than 
message j only \i j <i, i.e. message i was generated later than message j. 
A buffer is always sorted by the message number. The buffer is changed 
due to: (j) new message coming into the network; or {ii) a node that has 
newer messages than some of the messages in the online peers' buffers, 
enters the network. In the first case the newer message is appended to the 
buffer when the current buffer size is less than fc; if the current buffer size 
is equal to k then the oldest message of the buffer is pushed out. In the 
second case the buffers of the just arrived peer and any online peer are 
merged so that the merged buffer has the k newest messages of the \inion 
of those two older buffers. Each of the online peers then has the merged 
buffer. 



Notes: 
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